The Essential Dynamics Algorithm: Fast Policy Search In Continuous Worlds

نویسنده

  • Martin C. Martin
چکیده

This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. The algorithm can be seen as a generalization of linear quadratic control to nonlinear, non-regulation problems. A transform is presented of the stochastic MDP into a deterministic one which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results are presented in both a bicycle riding domain and the control of a robot arm on a dynamic base, a 14 dimensional state space. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. Code is available on the project’s web site.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Controlling Cardea: Fast Policy Search in a High Dimensional Space

The essential dynamics algorithm is a novel policy search algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces. We apply it to the control of a 5 degree of freedom robot arm atop a Segway base. Movement of the arm causes the base to translate and tilt, which in turn affects the movement of the arm. The state space has 14 dimens...

متن کامل

Adaptive search area for fast motion estimation

In this paper a new method for determining the search area for motion estimation algorithm based on block matching is suggested. In the proposed method the search area is adaptively found for each block of a frame. This search area is similar to that of the full search (FS) algorithm but smaller for most blocks of a frame. Therefore, the proposed algorithm is analogous to FS in terms of reg...

متن کامل

DISCRETE SIZE AND DISCRETE-CONTINUOUS CONFIGURATION OPTIMIZATION METHODS FOR TRUSS STRUCTURES USING THE HARMONY SEARCH ALGORITHM

Many methods have been developed for structural size and configuration optimization in which cross-sectional areas are usually assumed to be continuous. In most practical structural engineering design problems, however, the design variables are discrete. This paper proposes two efficient structural optimization methods based on the harmony search (HS) heuristic algorithm that treat both discret...

متن کامل

Gravitational Search Algorithm to Solve the K-of-N Lifetime Problem in Two-Tiered WSNs

Wireless Sensor Networks (WSNs) are networks of autonomous nodes used for monitoring an environment. In designing WSNs, one of the main issues is limited energy source for each sensor node. Hence, offering ways to optimize energy consumption in WSNs which eventually increases the network lifetime is strongly felt. Gravitational Search Algorithm (GSA) is a novel stochastic population-based meta-...

متن کامل

A Novel Continuous KNN Prediction Algorithm to Improve Manufacturing Policies in a VMI Supply Chain

This paper examines and compares various manufacturing policies which manufacturer may adopt so as to improve the performance of a vendor managed inventory (VMI) partnership. The goal is to maximize the combined cumulative profit of supply chain while minimizing relevant inventory management costs. The supply chain is a two-level system with a single manufacturer and single retailer at each lev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004